EN FR
EN FR


Section: Software

SimJoin (Distributed Approximate Similarity Join)

Participant : Alexis Joly [contact] .

SimJoin is a distributed software for the efficient computation of the full approximate k-nn graph of large collections of high-dimensional features. It is developed within a MapReduce framework and is therefore easily portable to large cloud computing plateform. It is based on recent theoretic contributions related to locality preserving hash functions [34] . Its first main feature is to allow splitting a large collection of high-dimensional features into highly balanced pages that preserve locality according to any given similarity kernel. Its second main feature is to build in O(n 1+γ ) operations a candidate set of item pairs that approximate the theoretic knn-graph with high recall. This software is developed in collaboration with INRIA Imedia.